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We have developed a fast, accurate and generally applicable method for inferring the power spectrum 
and its uncertainties from maps of the cosmic microwave background (CMB) in the presence of 
inhomogeneous and correlated noise. For maps with 10 4 to 10 5 pixels, we apply an exact power 
spectrum estimation algorithm to submaps of the data at various resolutions, and then combine 
the results in an optimal manner. To analyze larger maps efficiently one must resort to sub- 
optimal combinations in which cross-map power spectrum error correlations are only calculated 
approximately. We expect such approximations to work well in general, and in particular for the 
megapixel maps to come from the next generation of satellite missions. 
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I. INTRODUCTION 

The anisotropy of the Cosmic Microwave Background 
(CMB) is proving to be a powerful cosmological probe [0 . 
Many cosmological parameters, and the primordial power 
spectra of density and gravity-wave perturbations, can 
be inferred from the statistical properties of the CMB — 
in particular from its angular power spectrum Q . Unfor- 
tunately, exact methods for calculating the power spec- 
trum and its uncertainties from real observations are very 
expensive computationally (3). Supercomputers are re- 
quired for analysis of current datasets and even they will 
not be sufficient for the next generation of experiments 
Here we introduce an approximate method for re- 
ducing a CMB map to a power spectrum and its uncer- 
tainties. 

Generally applicable exact methods for finding the an- 
gular power spectrum, C;, that maximize the likelihood 
have operation counts proportional to N 3 where N is the 
number of pixels in the map. Our approach to overcom- 
ing this iV 3 scaling involves a hierarchical decomposition 
of the map into a set of submaps. That is, we subdi- 
vide the original ("primary") map into non-overlapping 
regions, each with a manageable number of pixels, in or- 
der to estimate the power spectrum from each of these 
submaps using an exact algorithm. To study the larger 
angular scale fluctuations we coarsen the primary map 
and if the number of these coarse pixels is still too large, 
we again divide into submaps. To go to yet larger an- 
gular scales, we coarsen the map further, etc... Then we 
calculate the expected correlations between the power 
spectrum estimates from all different submaps at all dif- 
ferent resolution levels in order to optimally average them 
together. A similar multi-grid technique was recently de- 
veloped for the reduction of time-ordered CMB data to 
maps ||. 

Several other approaches to overcoming the N 3 scaling 
have been tried. These include the "pseudo-C/" method 
of M , and the "correlation-function" approach of M . We 



expect these methods to work well in the case of ho- 
mogeneous noise, but to be significantly sub-optimal for 
the levels of inhomogeneity expected in planned observa- 
tional programs. None of these methods has been shown 
to deal properly with correlated noise. Minor modifi- 
cations of the correlation-function approach may make 
this path very attractive, though a remaining issue is the 
importance of noise correlations between pixels. 

The N 3 scaling has been overcome also by a special- 
purpose exact method that is expected to be applicable 
to the maps generated by NASA's Microwave Anisotropy 
Probe (MAP) satellite^. This method [§ assumes the 
noise is not correlated from one pixel to another and 
that the noise level variations are roughly azimuthally 
symmetric. Some of the techniques used in Q may even- 
tually find their place in more generally applicable (and 
yet still exact) power spectrum estimation algorithms, 
though the feasibility is not yet clear. Another special- 
purpose exact algorithm is that of ||, which is applica- 
ble to experiments that scan on rings. The main idea 
is to analyze ring sets instead of maps since both the 
noise and signal covariance structures are simple on the 
rings, whereas the noise structure can be complicated in 
the map space. Although some of its critical hypotheses 
have not been tested yet on realistic data, the ring-set 
approach might still be of practical importance since it 
may provide a useful zeroth-order solution for experi- 
ments that nearly scan on rings. 

In section II we describe our method in detail. In 
section III we present the results of an application to 
a map with ten thousand pixels — comparable to the 
size of maps coming from long-duration balloon (LDB) 
flights. In section IV we show results from a map four 
times larger and discuss prospects for application of our 
method to even larger maps such as those expected from 
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MAP and _PZancfc[j]. In section V we compare with other _ —F^Ti 

methods. In section VI we conclude. ^— ' ? lv 
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II. METHOD 

Here we first describe our method in the simplest con- 
ceptual terms, and then go on to discuss subtleties which 
complicate our implementation. 



A. From the Likelihood Function to the Quadratic 
Estimator 

We describe here the use of a quadratic estimator to 
find the maximum of the likelihood function, and the 
shape of the likelihood function near that maximum, as 
described in fic}| . Time-ordered data from observation of 
the CMB are usually reduced to a set of pixelized maps 
Ai, i = 1,...N which can be decomposed into the sum 
of a signal and a noise contribution, A = s + n . As- 
suming that both the noise and the signal are normally- 
distributed, their statistical properties are fully char- 
acterized by the covariance matrices S = (ss T ) and 
N = (nn T ). Assuming furthermore that the noise and 
signal are not correlated with each other, we can define 



C = <AA T ) = S + N . 



(1) 



The observed sky signal is assumed to be the realiza- 
tion of an isotropic Gaussian random field whose power 
spectrum C\ is the quantity we want to measure. Thus 
we are interested in the likelihood function £(A | C{) 
which is given by 



21n£(A | Ci) = In det C + A T C _1 A 



(2) 



In particular we are interested in the location of the max- 
imum of this function (which is the most likely C{) and 
the curvature at the maximum, — d 2 InC/dCidCi' (which 
is approximately the inverse of the covariance matrix for 
Ci). Note that C depends on C\ since 
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is the Fisher matrix flilfl . 

Equation |] is a quadratic function of the data and 
hence the expression "quadratic estimator" . Note that 
we have suppressed the pixel indices in the various vec- 
tors and matrices. Since ln£ is not equal to its second- 
order Taylor expansion (i.e., C is not a Gaussian in C/), 
some iteration is generally required to reach the likeli- 
hood maximum. 



B. Hierarchical Decomposition and Recombination 

Now let us consider multiple maps and use Greek in- 
dices to label them. Estimates of SCi from map a, SCf, 
are correlated with those from map f3 with correlation 
matrix: 
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where 



(7) 



Note that if a = j3 then Eq. |6| simplifies to the usual 
result: 



(6Ci6Cv) = F^ 1 



(8) 



Given this result, we know how to combine the various 
5Ci estimates from each submap into a final SCi estimate 
from all the submaps in a minimum-variance (optimal) 
manner. The minimum-variance 5Ci> satisfies 



where W is the covariance window function of the exper- 
iment. 

Given an initial estimate of Ci (hereafter, the input 
Ci) one can reach the likelihood maximum as follows. 
By Taylor-expanding In C to second order in SCi around 
C/, and replacing —d 2 h\C/dCidCv with its expectation 
value one can find an expression for SCi such that Ci+SCi 
maximizes the likelihood: 
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and has a weight matrix (inverse of covariance matrix): 
F u > =J2^wi' ■ (10) 

a/3 

Although for simplicity we have written these expres- 
sions for estimating individual C;'s, issues of signal-to- 
noise and spectral resolution usually lead us to estimate 
the power spectrum in bands of £, where the shape of 
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Ci inside the bands is assumed. The usual assumption 
(which we use in our applications) is that 1(1 + l)C//(27r) 
is constant inside the band. 

Our treatment of the correlations of the <5C/'s between 
pairs of maps has been general. The maps may be spa- 
tially separate or overlapping; they may have equivalent 
or different pixel sizes. Thus we have worked out the 
most general solution to optimally combine the power 
spectra of submaps which are the result of hierarchical 
decomposition (HD) of a primary map. 

C. Spectral Resolution 

Even with optimal combining of the power spectrum 
estimates from the various submaps, the HD procedure 
results in a sub-optimal estimation of the power spec- 
trum. Fortunately, in the cases we study, the sub- 
optimal results are quite close to the optimal results. 
Departure from the optimal results is almost entirely due 
to the degraded spectral resolution of the high-resolution 
submaps. This loss of spectral resolution is the primary 
drawback of the HD approach. 

The spectral resolution is most severely degraded at 
the highest resolution levels where the submaps have the 
smallest spatial extents. For any map of linear extent, 
L, it is difficult to distinguish the mode Pi(cos8) from 
Pi+si (cos 0) where SI < ir/L If one wishes to achieve 
a spectral resolution of SI for a square map with linear 
pixel size r p then this map must have n pixels where 

n^2.5xl0 3 (^-V . (11) 
V SI r p J 

Fortunately 51 = 30 and r p = T are expected to be ade- 
quate for LDB-type maps and 2.6 x 10 3 pixels is a small 
enough submap size to allow for reasonable computation 
times (as shown below). 

D. Scaling 

We now calculate how computation time scales with to- 
tal number of pixels in the full-resolution primary map, 
N, and the number of multipole-moment bands, Nb- For 
simplicity we assume that all submaps have the same 
number of pixels, n, and that we estimate the power spec- 
trum for each submap in the same number of bands. Es- 
timating the power spectrum and Fisher matrix for each 
submap takes on the order of N 2 n 3 operations so at the 
finest level we have on the order of N 2 (N/ n)n 3 = N 2 Nn 2 
operations. In a systematic coarsening (such as the one 
below defined by combining four pixels at one resolution 
into one larger pixel for the next coarser level), most of 
the submaps are at the finest resolution and therefore 
analysis and combining of these finest submaps domi- 
nates the demands on memory and CPU time. 



For large enough N, the dominant computational step 
will be in calculating the correlations between submaps. 
The matrix multiplication in Eq. |^ takes on the order of 
n 3 operations, so performing it for every pair of submaps 
and pair of bands takes on the order of N^N 2 n opera- 
tions. 

The procedure can in principle be parallelized for the 
efficient use of n proc processors, where n proc ranges any- 
where from Nb to ~ (N / n) 2 N 2 . The crucial use of paral- 
lclization comes in the dominant combining stage, which 
scales as (N/n) 2 N 2 , and involves the combination of 
~ 7}(N/n) 2 pairs of submaps. This type of independent 
pair loading can be efficiently shared on any number of 
processors lower than \(N/n) 2 . For LDB-type missions 
one might have N b ~ 10 and (N/n) 2 N^ ~ 4 x 10 4 and 
approximately 200 pairs of submaps to combine (if we 
indiscriminately retain all submap-submap correlations 
(see Section IV)). For supercomputers with n proc < 10 2 , 
every processor can be efficiently used. 

E. The Noise Matrix 

Our approach assumes that we begin with a pix- 
elized map and its corresponding noise covariance ma- 
trix. Map-making procedures usually produce a weight 
matrix, which is the inverse of the noise matrix. Invert- 
ing an arbitrary weight matrix takes on the order of ./V 3 
operations. Fortunately, this inversion only needs to be 
done once and is feasible for LDB-size maps. 

For larger maps, treatment of the weight matrix by 
general matrix inversion algorithms is impossible. Fast 
methods are being developed p3[ which rely on the ori- 
gin of the map weight matrix in the weight matrix of 
the time-ordered data. That is, the map weight ma- 
trix is A T N^ 1 A where N^} is (here) the time-stream 
weight matrix for a stationary noise process, and Au is 
the pointing matrix element that is one if at time-sample 
t the telescope is sampling map pixel i and zero other- 
wise. This special structure allows for each iteration of a 
conjugate gradient solution to be performed much faster 
than for an arbitrary matrix. 

Another possibility (suggested in (]]]) is to calculate 
the noise covariance matrix by Monte Carlo methods. 
In other words, one would make repeated simulations of 
the map noise and average those together to get any de- 
sired elements of the noise matrix. In addition to pos- 
sible speed advantages, this approach also has storage 
advantages since one probably needs fewer than A^/2 re- 
alizations to have a sufficiently accurate estimate of the 
noise. One may still need thousands of realizations of 
the noise — e.g., 20,000 realizations are required for the 
noise matrix elements to be accurate to within 1% of the 
diagonal. 
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F. Coarsening 



G. Iteration 



The amount of work to be done depends on the choice 
of number of resolution levels, which is governed by how 
many pixels are combined to form one pixel at the next- 
coarsest level. Greater coarsening between levels leads to 
fewer required operations, but at the expense of greater 
loss of information. Since the cost in computing time is 
slight for using the most modest coarsening possible while 
maintaining (roughly) square pixels, we always coarsen 
by averaging four pixels into one. This coarsening is also 
easily implemented in the HEALPix pixelization scheme, 
which we use p"4[ . 

In general, one can create a coarse submap A from a 
fine submap S as follows: 



A = W^awS 



(12) 



where a C i is one for all fine pixels i in coarse pixel c and 
zero otherwise, w is some weighting of the fine pixels and 
W = awa T . The coarse-fine and coarse-coarse noise 
covariance matrices are given by: 
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(13) 



and: 



(AA 3 
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Na rt n 

(if w = 1) 



= W- 1 awNw T a T W- 1 
= W" 1 (if w = N- 1 ) 



(14) 
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(if w = I) 



For optimal coarsening w = N^ 1 and for uniform av- 
eraging, w = I. We assume that we are coarsening four 
pixels into one and therefore that aa T = 41. We see 
that uniform averaging leads to noise covariance matri- 
ces that are easy to calculate. For optimal averaging we 
need to invert W which is substantially less challenging 
than inverting A -1 to get N since it has 1/16 the num- 
ber of elements. The same technique used for calculating 
N by exploiting the origin of TV -1 in time-ordered data 
(as explained in the previous subsection) can be used to 
get W- 1 @. 

Coarsening will usually result in pixel sizes that are 
large compared to the angular resolution of the instru- 
ment and therefore pixelization effects must be taken 
into account. Our treatment of the effect of pixeliza- 
tion on the signal correlation function is approximate, 
i.e., we use a pixel window which is the average of the 
evaluated power spectrum for every individual pixel. To 
prevent these approximations from creating errors in the 
final power spectrum, we ignore information from multi- 
pole moments greater than some critical value where the 
approximation introduces significant error. Pixelization 
effects are discussed in more detail in the Application 
section. 



A single application of the quadratic estimator of Eq. ^ 
might not result in a C/ that is sufficiently close to the 
likelihood maximum. This will be the case if the input 
Ci is too far from the likelihood maximum. Fortunately, 
iterative application of Eq. ^ has been shown to converge 
quite rapidly ]l0[| . 

When using the hierarchical decomposition approach, 
it is important that the iteration be done globally. That 
is, within each iteration, the power spectrum from each 
submap should be estimated using the same input C/ . If 
iteration is performed within the submaps, the combined 
result will suffer from cosmic bias p6[ , which results 
from the fact that uncertainties in C/ are not normally- 
distributed. For a normally-distributed variable, the cur- 
vature of the log of the likelihood function is independent 
of location in the parameter space (because the likelihood 
is a Gaussian). However, for C;, this curvature does de- 
pend on location. For larger values of C\ the curvature is 
smaller (i.e., the variance is larger). Thus, upward fluc- 
tuations should result in larger variances than downward 
fluctuations and so if one combines them together assum- 
ing Gaussianity, the net result is a downward bias due to 
the over-weighting of the downward fluctuations. 

The combination procedure of Eq. ^| implicitly assumes 
the estimates are normally distributed. We avoid the 
cosmic bias that might result from this assumption by 
weighting the downward and upward fluctuations equally. 
That is, we make sure to calculate T a \.fii> from the same 
Ci for all submaps. Thus any desired iteration, e.g., mo- 
tivated by a large correction from the input C'i, should 
be done globally. 

Since the uncertainty in C/ is non-Gaussian, specify- 
ing the Ci that maximizes the likelihood function, and 
(SCiSCi>), does not completely characterize the uncer- 
tainty. The uncertainty can be approximately character- 
ized by use of the "offset log-normal form" |lq ]. That is, 
error in the quantity Zi = In (C/ + x{) is approximately 
normally-distributed. The offset, a;;, is a measure of the 
noise contribution to the uncertainty, as opposed to the 
sample-variance contribution to the uncertainty. It can 
be calculated as outlined in lflq|. 



III. APPLICATION 

First we discuss the specifications for the simulated 
maps we used. Then we compare the results of HD with 
those of the exact method. 



A. Simulation Map Details 

We have applied our method using a Fortran code, 
which we have named Madcumba, to two different simu- 
lated maps, hereafter simulations A and B. In both cases, 
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the angular-power spectrum used was that of a CO BE 
normalized adiabatic, scale-invariant "lambda" cold dark 
matter (ACDM) model with fl A = 0.6, fl b = 0.05, 
^cdm = 0.35 and Hq = 75 km sec" 1 Mpc -1 and was gen- 
erated by the publicly available code CMBfast JlTj . The 
simulated signal maps were generated using the synf ast 
routine in the publicly available HEALPix package p4[ , 
at HEALPix N sidc = 256 (level 8, where N side = 2 lo ^ T ), 
in which the pixel solid angle is around (13.7') 2 , assum- 
ing a circular beam with full-width at half-maximum of 
20'. Finally, pixel noise taken from a Gaussian distribu- 
tion with zero correlations between pixels was added to 
the maps. The only significant differences between our 
two simulations are size and noise characteristics. 

The simulation A map has 10 4 pixels, is square in 
shape, and has a homogeneous noise variance of (20 [iK) 2 
in each pixel. Its relatively small size allows for the power 
spectrum to be estimated by the exact method (i.e., with- 
out dividing into submaps) using the MADCAP package 
[0]. This is compared to our calculation via HD into 
four equal-area square 2500 pixel submaps at full reso- 
lution and one coarse 2500 pixel submap at HEALPix 
A^dc = 128 (level 7) which covers the same area as the 
primary map. 

The simulation B map is also square in shape and 
has 4 x 10 4 pixels with a noise variance that is cosine- 
modulated throughout the map, varying from (20 /J,K) 2 
to 9 x (20 /J.K) 2 . Here, we decompose the primary map 
into sixteen submaps at full resolution, four submaps at 
the next coarser resolution and one coarsest resolution 
submap which covers the same area as the primary map 
but, by being two levels coarser, contains 1 / 16 th as many 
pixels. Thus, as with simulation A, we use n = 2500 pixel 
submaps. 



B. Comparison with Exact Method 

The top panel of Fig. [j] shows estimates of the powers 
from the individual submaps in simulation A. The bot- 
tom panel shows both the result of optimally combining 
them and the exact results obtained directly from the 
primary map. The solid line in both panels is the origi- 
nal power spectrum for the simulations. The differences 
between the power estimates are less than 20% of the 
standard error from the exact method. 

Not only do the power spectrum estimates agree quite 
well, but so do the estimates of the uncertainties. The 
error bars in Fig. |l| are the square roots of the diagonal 
elements of the respective Fisher matrices. In Fig. |] one 
can see how well entire rows of the exact and HD Fisher 
matrices agree. 

Clearly, the bigger the submaps at the finest resolution, 
the better this approach works. For a fixed length scale 
of interest, larger submaps contain a greater fraction of 
corresponding pixel pairs, and therefore achieve better 
spectral resolution (51). Unfortunately, the compute- 



time, when dominated by the combine procedure, scales 
as n and therefore as 1/S£ 2 (or possibly n 2 but with a 
much smaller pre-factor (see section IV)). Thus, choice 
of n can be critical. We studied how our information 
loss varies with n by comparing the error bars from the 
HD procedure to the full analysis for n — 2500 (the case 
above), n = 1600, and n = 900. The results are shown in 
Fig. ||. Note that for the n = 2500 case all the error bars 
are increased over the exact case by less than 10%. These 
larger error bars are consistent with the less than 20% 
differences (in units of variance of exact results) between 
the power estimates. 
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FIG. 1. Simulation A Results. Top panel: Power spec- 
trum estimates from four individual full-resolution 2500 pixel 
submaps (triangles) and one coarse 2500 pixel submap. Bot- 
tom panel: Power spectrum estimates from optimally com- 
bining the top-panel results (solid circles) and from the exact 
calculation (open circles). Note that in both panels, points 
are slightly shifted horizontally for clarity. 

The upward trend in error ratio with increasing band 
number is an effect of decreasing spectral resolution. To 
understand this, we examine Fig. ^ which shows the ratio 
of the HD over the exact method of the band contribu- 
tions to the total weight, Wb, where: 



(15) 



and the total weight of an experiment is W = J^b^b- 
For this analysis we switch to a finer binning of 25 bands, 
each with width 51 = 30. 

Note first the short-dashed line which is four times the 
ratio of Wb for one full resolution submap over the one for 
the primary map. If the four submaps were uncorrelated, 
we would expect this ratio to be ~ 1. However, since the 



5 



submaps are correlated, this ratio is greater than 1. We 
see that submap-submap correlations are more impor- 
tant at lower I than higher I values. 
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FIG. 2. Three rows of the Fisher matrix calculated exactly 
(solid lines) and also via the combination (HD) procedure 
(dashed lines) for simulation A. 
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FIG. 3. Error bars from HD divided by error bars for the 
exact analysis. Each case represents a primary map with 4n 
pixels divided into four n-pixel full resolution submaps and 
one coarsened n-pixel map where n = 30 x 30 (triangles), 
40 x 40 (squares) or 50 x 50 (hexagons). 



Though individual elements of the Fisher matrix may 
be larger for a sub-optimal method than an optimal one, 
we know that the contribution from a given band to the 
total weight can not be larger. Thus, the best we could 
hope for is that the ratio of Wb for the HD method over 
the exact method is near unity. We see from Fig. |J that 
it is everywhere greater than 0.97. Thus the fact that the 
combine procedure gives at most 10% larger error bars 
(20% larger variances) in Fig. || can not be due to any 
reduction in the total weight (which we see is negligible), 
but must be due to how each Wb is distributed among the 
Fbf ■ In particular, it is the lower spectral resolution of 
the smaller submaps which results in the Wb being more 
spread out within a Fisher matrix row and less concen- 
trated in the diagonal element Fbb as is clear from the 
7th row plotted in Fig. g. 
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FIG. 4. W b /W§™ ct where W b = J2v F w- The Wb ' s are 
from analysis of the simulation A map, but into finer bins of 
width SI = 30. The short-dashed line is 4W b /W^ ct where 
Wb is just from analysis of one of the four full-resolution 
submaps; the long-dashed line is Wi,/W 6 cxact where Wb is 
from analysis of the coarse resolution submap; the solid line 
is Wb/W^ where Wb is from combining information from 
all five submaps. 

A plot of Fbb ratios (similar to the Wb ratio plot of 
Fig. |]) shows that the cost of this weight redistribution 
within a Fisher matrix row is a decrease in the diagonal 
Fisher elements (in the i = 250 to I — 600 range) to 
80-85% of the exact ones. Not only is Fbb suppressed 
then, but the larger off-diagonal elements also lead to 
larger diagonal elements of F~ x . With broader bands 
(such as those used for Fig. ||), the error-bar increase 
due to degraded spectral resolution is not as severe. The 
effect of the larger off-diagonal elements propagates from 
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band-to-band and is least significant at the lower bands 
which are benefiting from the full spectral resolution of 
the coarse submap. 

Also in Fig. [| one can see that the pixelization effects 
can be fairly severe. This is unfortunate since we only 
treat the pixelization influence on the signal-correlation 
matrix, S, approximately. Our treatment is that pro- 
vided with the HEALPix package, which assumes that 
the correlation between two pixels only depends on the 
angular distance between them and not on their orienta- 
tion. This is an approximation for two reasons: the pixels 
are anisotropic, and their shapes depend on their loca- 
tion. The validity of the approximate window-function 
can vary from submap to submap if the submaps are not 
large enough to have a representative sampling of all pixel 
shapes. This is another reason to use large submaps. We 
take each cross-level pixel window function to be the ge- 
ometric mean of the two auto-level pixel window func- 
tions. 

Because our treatment of pixelization effects is approx- 
imate, we throw out information from coarse submaps at 
a conservatively low I value. In simulation A, for exam- 
ple, powers from the coarse resolution submap were only 
considered for I < 225. In the final combined results, the 
higher bands only use information from the four fine reso- 
lution submaps. We eliminate the influence of the coarse 
submap on the higher bands by inserting very large num- 
bers into diagonal elements of the (J 7 ) a i,a'l' matrix. 
This marginalization technique is described in Appendix 
A of [[l0| and can be understood as artificially adding 
some noise to these particular bands so as to give them 
very low weight. 

The upturn in Figure [| after £ — 225 where the coarse 
submap information is no longer used indicates that there 
may be an advantage to keeping the coarse submap in- 
formation to yet higher £. This would require a more 
accurate treatment of the pixel effect on the signal cor- 
relation function and its derivatives with respect to C;. 
One way to do this, which would be fairly easy to imple- 
ment and not cause significant speed reduction, would 
be to avoid using pixel window functions by calculating 
coarsened signal matrices directly from finer ones. For ex- 
ample, if the fine signal matrix is s then the next-coarser 
signal matrix, S, must be 



where a C i is one for all fine pixels i in coarse pixel c and 
zero otherwise. Once again, we are summing four pix- 
els into one. The only approximations here come from 
approximations made in calculating s. If these approxi- 
mations were acceptable for the finer level, they will cer- 
tainly be adequate for the coarser level. Keeping the 
coarse level information out to higher bands may be very 
important for extension to megapixel maps because it is 
the only other way to improve spectral resolution besides 
increasing n for the highest-resolution submaps. 



IV. ANALYSIS OF GENERAL MEGAPIXEL 
MAPS 

The map from simulation A has homogeneous white 
noise. Below we will discuss results from HD analysis 
of the map from simulation B in which the noise is in- 
homogeneous but still uncorrelated. Yet we believe HD 
will work well on realistic maps with correlated noise. In 
this section we briefly make the case for the success of 
HD in the presence of correlated noise and then move 
on to discuss how HD can be made to work for primary 
maps with 100 to 1000 times more pixels than the sim- 
ulation A map. We will see that further approximations 
are necessary, but that they are likely to work well. 

Even though our applications of HD have only been on 
simulated maps with uncorrelated noise, we believe that 
HD will work well on realistic maps with correlated noise. 
This is easiest to see for correlations on length scales 
smaller than the size of the smallest submaps. Longer- 
range noise correlations will not be treated accurately 
in the analysis of the smallest submaps. But this does 
not matter because the effect will only be on lower-£ 
bands where the smallest submaps do not have much 
weight. The affected bands will be those determined by 
coarser and larger submaps that will once again be large 
compared to the correlation length. Thus the prospects 
for HD on maps with correlated noise are quite good. 

Applying HD as we have described it to megapixel 
maps is prohibitively expensive in terms of the demand 
on computing resources. A rough scaling argument is 
sufficient to demonstrate this point. In the megapixel 
regime, we are strongly dominated by the calculation of 
all the elements of ^F^la'l" ^he number of elements in 
this matrix is ~ {N/n) 2 Ng. On an SGI Origin 2000, the 
calculation of a single element of F~~iu'i' takes 188 sec 
(n/2500) 3 on a single MIPS R12000 300 MHz processor 
where n is the number of pixels in a submap. Thus the 
wall-clock time is 

l-W^V " V^f^ (17) 
Woe/ V3 x 10V V2500/ \W00j v ; 

where we have assumed the efficient use of n proc proces- 
sors Jl8| ], Thus the need to avoid exact calculation of 
every element of T~l a , v is apparent. 

To make the case for the likely success of fast approxi- 
mations to T~l a iy we turn to the results from simulation 
B. In Fig. ||, we plot four power spectra: one is the result 
of optimally combining the individual power spectra; one 
is the power spectrum of the coarsest submap; the other 
two are the result of a simple averaging of the power spec- 
tra for the submaps within a given resolution level as if 
they were independent. Again, the solid line represents 
the original input power spectrum. 

We find the very good agreement between simple aver- 
aging and the exact combination (for the highest bands) 
to be very encouraging because it is strong evidence that 
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signal correlations between non-overlapping submaps are 
not very important. We certainly see they are not im- 
portant in the highest bands which are influenced only 
by submaps with no spatial overlap (since the submaps 
are all at the same resolution level) . If any given band is 
only influenced by at most two or three levels and we only 
need to calculate correlations for non-zero submaps then 
the vast majority of submap pairs can be ignored. Even 
if some cannot be ignored, their relative insignificance 
means that there are probably crude approximations to 
them that will work well. 
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can probably be treated approximately in an insignificant 
amount of time. Development and study of these approx- 
imations is probably necessary for practical application 
of HD to megapixel maps. 

We also see from Fig. || that even when there is a mix 
of resolution levels influencing a band, using just one of 
those levels provides a rough approximation. A fairly 
good "quick-and-dirty" power-spectrum estimator is the 
coarsest submap's power spectrum for band 1, the coarse 
submaps' power spectrum for bands 2 and 3, and the 
finest submaps' power spectrum for bands 4 to 8. Such 
an estimator has its applications, for example, finding a 
Ci that is close enough to optimal that one only needs a 
single iteration of the exact HD procedure. 

The scaling of t with N in Eq. [18| is linear if n is fixed. 
But if we fix spectral resolution and the area of the pri- 
mary map, then n cx N and therefore t cx N 3 once again! 
Or, at fixed N and primary map area, t oc ( 1 /<5Z ) 4 . Our 
fiducial choice above of n = 5000 corresponds for Planck 
with N = 10 7 and r p = 3.5' to Si = 45. This may be suf- 
ficient since physical models have fairly smooth power- 
spectra. We see that the degree to which degraded spec- 
tral resolution affects our ability to discriminate between 
different models is a crucial issue for the applicability of 
HD to Planck. We remind the reader that spectral reso- 
lution is the only thing that is significantly compromised 
with HD; Fig. shows the total weight from each band 
is within a few percent of optimal. 



V. COMPARISON WITH OTHER METHODS 



1 

FIG. 5. Results from HD of the 200 by 200 pixel simulation 
B map. There are sixteen submaps at the finest resolution 
level (level 8 (13.7') pixels), four at the medium level (level 
7 (27.5') pixels) and one at the coarsest level (level 6 (55' 
pixels)). Triangles and squares represent the result of doing 
a naively weighted average of the power spectrum estimates, 
i.e., neglecting correlations, from the sixteen fine submaps and 
the four coarse submaps, respectively. Pentagons represent 
results for the one coarsest submap. Filled circles show the 
results of the optimal combination of all submaps in which 
all power-spectrum correlations are computed exactly. As in 
Fig. |l], points are shifted horizontally for clarity. 

Calculating only the correlations between overlapping 
submaps at adjacent resolution levels takes time 



t = 78 hours 



200 



fc proc 



N 



\ /_n_y 
J V 5000/ 



Nb 
40 



(18) 



where for each of the Nb bands only the nearest AiV;, 
bands are considered p9| . Calculating correlations be- 
tween overlapping submap pairs whose resolution levels 
differ by 2 will, at most, double the time. Correlations be- 
tween non-overlapping map pairs may be significant but 



The HD method has many advantages over other fast, 
approximate methods. Perhaps the chief advantage is 
its ability to handle maps with correlated noise. Its main 
disadvantage is spectral resolution. To understand better 
these competitive advantages/disadvantages it is worth 
spending some time discussing these other methods — 
especially since we will see they are somewhat comple- 
mentary and hence a hybrid approach may be useful. 

This discussion of other methods is facilitated by writ- 
ing down the following generalization of Eq. ^: 



Ci 



and Eq. [5| 



-Tr 



W ( AA T -N)W 



dd dC v 



dC 
dC v 



(19) 



(20) 



These equations specify a general unbiased quadratic es- 
timator, with pixel pair-weighting determined by W. 
The Fw matrix is derived by demanding that the es- 
timator be unbiased (((7 ; estlmatc ) = Q). In general, its 
inverse is not equal to (SCiSCy) which is instead given 
by 
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1 - ^F^\A V ,CA VI ,C\ 



(21) 



where A x = W-§^W, similar to Eq. fj]. 

For the minimum-variance estimator, W — C _1 . The 
"correlation-function" approach (CF) of uses the sim- 
pler W — I in pixel space pofl . Spherical-harmonic 
transforming the map and averaging |a; m | 2 's over m 
uses W = I in spherical-harmonic space. The multi- 
scale method we have just described above likewise cor- 
responds to a choice of W, although this W is not easily 
written down. 

It is worth pointing out that the estimator for CF re- 
quires on the order of N 2 N 2 operations where Nb is the 
number of £-bands. One can get rid of the N 2 factor 
by rewriting it as an estimator for C{9) in fine bins of 9 
and then Legendre-transforming the result, as was done 
in jjj. Further computational accelerations are possible 
by use of KD-tree search techniques which use coarse- 
graining at large distances pl|j22|] . In addition, fast 
spherical harmonic transforms lead to great time-savings 
in harmonic methods. 

However, the simplicity of these other choices for W 
does have drawbacks. Specifically, high-noise areas and 
low-noise areas make equal contribution to the estima- 
tor. To date, the success of these methods has only been 
demonstrated on simulations with homogeneous white 
noise. The first obvious improvement to CF is to re- 
place Wij = Sij with Wij — 1/afSij (in pixel space) as 
suggested in 0. 

What is less obvious is how to weight pixel pairs in the 
presence of correlated noise. This is where further devel- 
opment of the CF approach is most needed. One possible 
route to pursue is band-diagonal choices of Wij which 
capture the spatially-local noise correlations. Computa- 
tion with band-diagonal W's can still be quite fast; they 
are still order of TV 2 as long as the bandwidth is less 
than \/~N. Perhaps longer-range correlations could be 
included in some hybrid scheme of HD and CF. Here CF 
(with band-diagonal Wij) would be used on the primary 
map and then HD would be used to calculate lower-£ 
values which may have been affected by long-range noise 
correlations. This hybrid scheme also has the advantage 
of complementing HD where its spectral resolution is low- 
est @. 

Although the calculation of Ci is fast with simple 
choices for W, the calculation of the error covariance ma- 
trix (Eq. 21) is slow; i.e. the number of operations scales 
with N 3 because of the matrix multiplications. One op- 
tion is to estimate the errors by Monte-Carlo methods 
. Another is to combine the CF and HD approaches in 
yet another way: use CF as a means to produce an input 
power spectrum sufficiently close to the optimal one that 
only a single iteration of HD is required. 



VI. CONCLUSIONS 

We have concentrated on developing a fast and reli- 
able method for calculating power spectra and their un- 
certainties from maps with N = 10 4 to 10 5 pixels. Meth- 
ods that work in this regime are of immediate practical 
importance. Our tests show very good agreement with 
exact methods at the lower end of our N range where 
the exact analysis is feasible on a supercomputer. The 
HD method is the only existing method for calculating 
a power spectrum and its uncertainties from general, in- 
homogeneous correlated noise patterns with maps of this 
size in reasonable amounts of time J24|. 

We have not tested our method on maps with cor- 
related noise. But since noise-correlations are taken 
into account exactly within each submap, we expect our 
method to handle correlated noise effectively, unlike the 
other fast methods mentioned above. These expectations 
will be put to the test soon as HD is applied to exist- 
ing datasets from LDB flights, such as ^4rc/ieops[j] and 
TopHa^. 

The local nature of the method has some advantages 
for controlling contamination of the final power spectrum 
result. In the extreme, one can simply cull submaps with 
the largest foreground contamination. Less drastically, 
one could down-weight the power spectrum determina- 
tions from submaps according to the suspected level of 
contamination. 

To summarize, we have developed and investigated an 
HD method of power-spectrum estimation. We have 
demonstrated that for LDB-size maps HD is sufficiently 
fast and insignificantly sub-optimal. Its main advan- 
tages over other fast methods are its generality (includ- 
ing its ability to handle correlated noise) and the fact 
that the power spectrum uncertainties are calculated di- 
rectly. Application to larger maps will rely on further 
approximations which we expect to work well but require 
further investigation. The main disadvantage to HD is 
the degraded spectral resolution at the smallest angular 
scales. The impact of this degradation on parameter- 
determination also warrants further investigation. The 
combination of HD with other methods may be fruitful. 

Madcumba, a Fortran 90 implementation of the HD 
procedure, will be made available for public use. Com- 
ments and questions should be directed to O. Dore at 
dore@iap.fr. 
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